In-Place Parallel Super Scalar Samplesort (IPSSSSo)
نویسندگان
چکیده
We present a sorting algorithm that works in-place, executes in parallel, is cache-optimal, avoids branch-mispredictions, and performs work O(n logn) for arbitrary inputs with high probability. We ran extensive experiments and show that our algorithm scales linearly in the number of cores on various multi-socket machines with 32 cores. On large inputs, we outperform our closest inplace competitor by a factor of 2.25 to 2.53 and our closest non-in-place competitor by a factor of 1.26 to 1.89. Even sequentially executed, we outperform our closest sequential competitor, BlockQuicksort, by a factor of 1.19 to 1.23 for large inputs. 1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems
منابع مشابه
Super Scalar Sample Sort
Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory parallel computers. We show that sample sort is also useful on a single processor. The main algorithmic insight is that element comparisons can be decoupled from expensive conditional branching using predicated instructio...
متن کاملAN OPTIMAL FUZZY SLIDING MODE CONTROLLER DESIGN BASED ON PARTICLE SWARM OPTIMIZATION AND USING SCALAR SIGN FUNCTION
This paper addresses the problems caused by an inappropriate selection of sliding surface parameters in fuzzy sliding mode controllers via an optimization approach. In particular, the proposed method employs the parallel distributed compensator scheme to design the state feedback based control law. The controller gains are determined in offline mode via a linear quadratic regular. The particle ...
متن کاملPortability of performance with the BSPLib communications library
The BSP cost model makes a new level of power available for designing parallel algorithms. First, it models the actual behaviour of today’s parallel computers, and so can be used to choose appropriate algorithms without completely implementing them. Second, it becomes possible to characterise the range of architecture performance over which a particular algorithm is the best choice. This provid...
متن کاملSorting on a Massively Parallel System Using a Library of Basic Primitives: Modeling and Experimental Results
We present a comparative study of implementations of the following sorting algorithms on the Parsytec SC320 reconfigurable, asynchronous, massively parallel MIMD machine: Bitonic Sort, Odd-Even Merge Sort, Odd-Even Merge Sort with guarded split&merge, and two variants of Samplesort. The experiments are performed on 2up to 5-dimensional wrapped butterfly networks with 8 up to 160 processors. We ...
متن کاملBlockQuicksort: Avoiding Branch Mispredictions in Quicksort
Since the work of Kaligosi and Sanders (2006), it is well-known that Quicksort – which is commonly considered as one of the fastest in-place sorting algorithms – suffers in an essential way from branch mispredictions. We present a novel approach to address this problem by partially decoupling control from data flow: in order to perform the partitioning, we split the input in blocks of constant ...
متن کامل